tl;dr
Astrocytes make up about a third of the cells in your brain, and form a second, distinct brain network from the more-studied neuronal network. As neuroscience has discovered in only the last few years, the astrocyte network carries out extremely important computations to integrate information from across different brain networks, and to use this information to control physiological state and maintain long-term homeostasis. To help push the field’s understanding of astrocytes’ unique role forward, we’re releasing OpenAstrocytes—the largest repository of structured astrocyte calcium dynamics to date, drop-in ready for use in AI workflows.
To take a look at it yourself, run
pip install astrocytes
Background
The human brain is about one-third neurons. But two-thirds of it isn’t; and, in recent years, the neuroscience field has become more acutely aware of the incredible importance of the roles played by those other sorts of cells.
One of those other thirds is made up of astrocytes—a crucially important subdivision of the brain’s little-understood glia (from “glue”, referencing old theories of their supposed passivity in brain function). Astrocytes form a cellular network woven between the neurons Figure 1, and as recent work has demonstrated (1), this network encodes rich information about local and global cheical physiological state.
All biological cells must incorporate a massive number of data-points, coming both from the outside world and also from the internal processes of the cell; indeed, this binding-together of disparate interactions into a shared, coherent, soliton-like packet is, perhaps, what makes life life. While all the diverse kinds of cells across the entire diverse tree of life have unique approaches to sensing and making sense of these signals, because of shared evolutionary history, a few common structural threads span vast swaths of this diversity. One of those central commonalities is the use of intracellular calcium signaling.
But astrocytes have a unique, and incredibly rich, way of signaling with calcium; the best way to appreciate it is to look at it by eye:
Naturalistic astrocyte calcium activity is composed of discrete calcium events—high-dimensional, complex, activity that is dynamically coupled to both neuronal activity and activity within the rest of the astrocyte network. While recent work using more classical ML techniques has gotten us closer to cracking this “calcium code” in the astrocyte network, the rise of today’s AI tools raises the possibility of being able to find so many more features in this rich imaging that we, by eye, could never have even conceived of. What’s needed is a data repository designed from the ground up to take advantage of these tools.
Architecture
The best biological data science today is heavily leveraging the advances that have been made in AI over the last few years, and taking advantage of these tools to understand the latent structure of new biological datasets. We have previously used advances in machine learning to get at new features of our data that have led us to exciting new biology (1); but harnessing the full power of the methods that have risen to prominence over the last few years—and the new infrastructure and logistical challenges they present—has necessitated a ground-up reconceptualization of how we work with data at Forecast.
In particular, we wanted to take advantage of both:
- the ease and flexibility provided by streaming data for prototyping and deploying AI workflows; and,
- the richness of structured data that allows us to quickly make sense of the different metadata elements for samples or batches streamed in.
The former of these points is crucial for the nuts and bolts of scaling model training and inference, and has motivated advances in data tooling for AI like WebDatasets (3). The latter, we believe strongly, is a crucial point of departure for biomedical data; while immense progress has been made across domains by taking advantage of the incredible empirical power that lies in scaling, we believe architecting workflows at their foundation around keeping track of the richness of data’s explicit structure—and, most crucially, the structured relationships between those explicit data structures (4)—is a central part of how to leverage the power of these methods for biological applications, where the intrinsic squishy messiness makes purely tabula rasa inference of latent structure much more of an uphill battle.
Backbone: Distributed, typed WebDatasets with atdata
To realize this, we built the OpenAstrocytes dataset on atdata, a new open-source package for structured, distributed, and interoperable streaming datasets we’re developing at Forecast to aid in AI training and inference, both in our own work and for the larger investigative community.
atdata internally was to have fast infrastructure for putting together our rich, multimodal datasets that speeds up our iteration cycle for new AI R&D; we’re excited about the features we’re working on rolling out to the community soon!atdata is built on the open WebDataset format, designed for use as a high-performance AI I/O layer used by many other research groups, and so is able to take advantage of their streaming, caching, smart batching, and drop-in PyTorch connections.
atdata adds functionality that makes it straightforward to understand structured data from remote repositories:
import astrocytes
# Dataset sample type schema
from astrocytes.schema import BathApplicationFrame
dataset = astrocytes.data.bath_application \
.as_type( BathApplicationFrame )
# Pop off a single sample
exemplar = next( x for x in dataset.ordered( batch_size = None ) )
# Pop off and z-project a single batch
batch = next( x for x in dataset.ordered( batch_size = 60) )
batch_summary = np.quantile( batch.image[:, 0], 0.75, axis = 0 )
with HouseStyle() as s:
fig, ax = plt.subplots( figsize = (8, 8) )
plot_micrograph( s, batch_summary, ax,
scale_x = exemplar.scale_x,
scale_y = exemplar.scale_y,
scale_bar = 100.,
clim = (0, 140),
)Here, the as_type method is hiding logic under the hood that takes advantage of pre-implemented bidirectional lenses that are opinionated about the way that generic microscopy images should be interpreted, if possible, as data coming from the specific experiments in the OpenAstrocytes data. These lenses have a rich compositional structure, and we are excited about the possibilities that follow downstream from starting at bedrock by keeping track of
The full documentation for atdata can be found on GitHub.
Datasets
Our initial release of OpenAstrocytes is built from two large collections of previously-published calcium imaging from mouse astrocytes, originally analyzed in (5) and (1). In total, these datasets comprise in total about 1M images, making this release among the the largest open datasets of astrocyte calcium dynamics to date, to our knowledge.
This initial release of the OpenAstrocytes corpus contains two-photon imaging of astrocyte intracellular calcium. These data were recorded during two different experimental setups:
- Bath application (
astrocytes.data.bath_application): in these experiments, ex vivo visual cortical slices from mice were exposed to one of two different agents (baclofen or t-ACPD) after a baseline period, with the agents added to the solution bathing the full slice after the time of addition. - Two-photon photo-uncaging (
astrocytes.data.uncaging): in these experiments, ex vivo visual cortical slices from mice were exposed to one of two caged neurotransmitters (RuBi-GABA or RuBi-glutamate), with a pulse (or pulses) of a second laser applied after a baseline period to “free” the GABA or glutamate from the ruthenium bipyridine cage rendering it inert initially, but only within the small, ~10µm locus directly stimulated by the second laser (6).
If you want to play around with OpenAstrocytes, you can add it to your python project with
# with `pip`:
pip install astrocytes
# or, with `uv`:
uv add astrocytes
Demo
Let’s take a look at some of the fun structure that we have in the OpenAstrocytes data. Here, we’ll be digging into one of the experiments from (1), in which ex vivo slices from mouse visual cortex had two drugs—baclofen and t-ACPD1—bath-applied across the slice. Here, we’ll use the OpenAstrocytes dataset with contemporary tools to give views into the differences between these two compounds’ effects on brain networks.
1 Baclofen acts as an agonist of the GABAB receptor; t-ACPD acts as an agonist of the metabotropic glutamate receptor subtypes mGluR2 and mGluR3
At the outset, we’ll tuck away a few parameters of this experimental setup that will come up across analyses:
Code
t_intervene = 300
"The applied compound begins to enter the bath at 300s after recording onset"Approach
In previous work (1), we identified the differences in astrocyte networks’ responses to baclofen and t-ACPD by using a spatial-temporal segmentation technique known as graphical time warping (7), as implemented in the AQuA toolbox (2). However, recent advances in large, self-supervised image models like DINOv3 (8) .
The overall approach presented in this demo is to run inference on the OpenAstrocytes bath application data using one of the vanilla DINOv3 models (dinov3-vit7b16-pretrain-lvd1689m), and leverage the fact that this model’s internal architecture separates out its image feature representations into patch embeddings. Each microscopy image is separated by the model into a 14-by-14 grid of image patches, with each patch corresponding to a ~60µm square portion of the imaged slice. At each time point over the course of an individual recorded movie, DINOv3 provides each one of these patches with a time series of 4096-dimensional embeddings, corresponding to its decomposition of the image into representative features as learned from its self-supervision over a vast swath of (principally non-biological, but with some exceptions) images.
Here, we use a simple linear basis change (via principal component analysis) to find a 64-dimensional subspace of this 4096-dimensional space already learned within the DINOv3 self-superevision objective that captures the subset of overall image features that appear in individual astrocyte calcium patches. This subspace captures both the image feature variability across different local astrocyte network morphologies as imaged in individual patches (as in Figure 8), as well as the image feature variability induced by the specific changes of the applied pharmacology in the bath application experiment. As it turns out, the richness in these feature representations—even in the absence of any biology-specific fine-tuning!—is enough to decode the identity of the applied compound from the calcium fluorescence pattern within a single ~60µm patch at a single time-point.
DINOv3 patch embeddings capture rich, dynamic responses within astrocyte networks
First, we’ll take a look at the latent structure of local patches of astrocyte network calcium, and how those features unfold after application of pharmacology.
Individual patch responses
First, we’ll download a representative movie from the OpenAstrocytes bath application dataset—the same recording we projected above in Figure 3 to take a look at the overall anatomy of the slice—and examine the behavior of a couple representative patches from it.
Code
# Snag a convenient smoother for image preprocessing
from scipy.ndimage import gaussian_filter
## Constants
N_PATCHES_Y = 14
N_PATCHES_X = 14
## Data hyperparameters
test_patches = [
(10, 12),
(3, 7),
]
"Indices (vertical, horizontal) of the demo patches to show"
first_movie_id = 'd22f6a65-b3e2-46c2-ba5c-551920af1fe3'
"Microscope-supplied UUID of the demo movie to show"
##
movie_frames = _get_movie( first_movie_id, verbose = False )
first_image = movie_frames[0].image[0]
patch_size_y = first_image.shape[0] / N_PATCHES_Y
patch_size_x = first_image.shape[1] / N_PATCHES_X
test_patch_frames = []
for cur_i, cur_j in test_patches:
patch_idx_y = ( int( np.floor( patch_size_y * cur_i ) ),
int( np.ceil( patch_size_y * (cur_i + 1) ) ) )
patch_idx_x = ( int( np.floor( patch_size_y * cur_j ) ),
int( np.ceil( patch_size_y * (cur_j + 1) ) ) )
test_patch_frames.append(
np.array( [
frame.image[0][patch_idx_y[0]:patch_idx_y[1], :][:, patch_idx_x[0]:patch_idx_x[1]]
for frame in movie_frames
] )
)
test_patch_frames_filtered = [
gaussian_filter( fs, sigma = (3., 0.6, 0.6) )
for fs in test_patch_frames
]
## Figure parameters
patch_cmax = 110
"Maximum intensity value for patch snapshot figure"
i_frame_compare = (250, 350, 450, 550, 650, 750)
"Indices of time-slices to display for patch snapshot figure"
#
dt = movie_frames[3].t
n_patches = len( test_patch_frames_filtered )
n_frame_compare = len( i_frame_compare )
##
with HouseStyle() as s:
fig, axs = plt.subplots( n_patches, n_frame_compare,
figsize = (10, 4),
sharey = True,
)
for i_patch in range( n_patches ):
cur_patch_frames = test_patch_frames_filtered[i_patch]
for i_t in range( n_frame_compare ):
axs[i_patch, i_t].imshow(
cur_patch_frames[i_frame_compare[i_t], :, :],
#
clim = (0, patch_cmax),
cmap = 'afmhot',
)
if i_patch == 0:
t_val = i_frame_compare[i_t] * dt - t_intervene
axs[i_patch, i_t].set_title( f'{t_val:+0.0f}s', fontsize = 12 )
if i_patch == (n_patches - 1) and i_t == 0:
axs[i_patch, i_t].tick_params( axis = 'x', length = 3 )
axs[i_patch, i_t].set_xticks( [-0.5, patch_size_x+0.5], ['0', f'{movie_frames[0].scale_x * patch_size_x:0.0f} µm'], ha = 'left' )
else:
axs[i_patch, i_t].tick_params( axis = 'x', length = 0 )
axs[i_patch, i_t].set_xticks( [-0.5, patch_size_x+0.5], ['', ''] )
axs[i_patch, i_t].set_yticks( [] )
if i_t == 0:
axs[i_patch, i_t].set_ylabel( f'Patch {i_patch+1}' )
plt.subplots_adjust( wspace = 0.04, hspace = 0. )These two patches are good illustrations of the “median” structure of this imaging data: both patches exhibit some changes after compound application (compare the right four columns with the left two); but, these changes can be subtle to the naked eye, as is particularly the case with Patch 2, shown in the bottom row of panels.
However, the image embedding model is able to pull out a good deal of structure in its latent feature embeddings—and, in particular, in dimensions of that embedding space maximally capturing the OpenAstrocytes corpus (i.e., our PCs). In particular, we see that there are axes within the patch-embedding space (PCs) that capture robust changes in the captured imaging induced by the applied compound, even if those changes were a bit more subtle to the naked eye:
Code
from astrocytes._datasets._embeddings import PatchEmbeddingTrace
## Figure params
baseline_window = (-90, -30)
plot_window = (-100, 240)
panel_verbose = False
#
wds_url = (
'https://data.forecastbio.cloud'
+ '/testing/patch-pc-traces/bath-application/'
+ 'bath_app-dinov3_vit7b16-pca64-smooth84.tar'
)
ds = atdata.Dataset[PatchEmbeddingTrace]( wds_url )
it = ds.ordered( batch_size = None )
if panel_verbose:
it = tqdm( it )
#
test_traces = [ None for _ in test_patches ]
for trace in it:
try:
assert trace.metadata is not None and 'uuid' in trace.metadata
assert trace.metadata['uuid'] == first_movie_id
for i_patch, (cur_i, cur_j) in enumerate( test_patches ):
if trace.i_patch == cur_i and trace.j_patch == cur_j:
test_traces[i_patch] = trace
should_finish = True
for x in test_traces:
if x is None:
should_finish = False
if should_finish:
break
except:
# print( trace.metadata['uuid'], trace.i_patch, trace.j_patch )
continue
#
import matplotlib.pyplot as plt
from forecast_style import HouseStyle
def _baseline_normalize( ts, ys, window ):
filter_cur = (
(ts >= window[0])
& (ts < window[1])
)
mean = np.mean( ys[filter_cur] )
std = np.std( ys[filter_cur] )
return (ys - mean) / std, mean, std
#
causal_offset = (84 * dt) / 4.
t_rel = test_traces[0].ts - t_intervene + causal_offset
with HouseStyle( grids = True ) as s:
fig, axs = plt.subplots( n_patches, 1,
figsize = (7, 7),
sharex = True,
# sharey = True,
)
for i_patch in range( n_patches ):
ax = axs[i_patch]
cur_trace = test_traces[i_patch]
values_0_norm, mean_0, std_0 = \
_baseline_normalize( t_rel, cur_trace.values[0, :], baseline_window )
values_1_norm, mean_1, std_1 = \
_baseline_normalize( t_rel, cur_trace.values[1, :], baseline_window )
values_2_norm, mean_2, std_2 = \
_baseline_normalize( t_rel, cur_trace.values[4, :], baseline_window )
filter_plot = (
(t_rel >= plot_window[0])
& (t_rel < plot_window[1])
)
ax.plot( t_rel[filter_plot], values_0_norm[filter_plot], 'C0-', label = 'PC 1' )
ax.plot( t_rel[filter_plot], values_1_norm[filter_plot], 'C2-', label = 'PC 2' )
ax.plot( t_rel[filter_plot], values_2_norm[filter_plot], 'C3-', label = 'PC 5' )
yl = (-5.5, 10.5)
ax.fill_between( [0, plot_window[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0 )
ax.plot( plot_window, [0, 0], 'k-', linewidth = 1 )
ax.plot( [plot_window[0], plot_window[0]], yl, 'k-', linewidth = 1 )
ax.set_xticks( [-90, 0, 60, 120, 180, 240] )
# ax.set_yticks( [0, 5, 10] )
ax.set_xlim( plot_window )
ax.set_ylim( yl )
ax.set_ylabel( f'Patch {i_patch + 1}' )
if i_patch == (n_patches - 1):
ax.set_xlabel( 'Time (s)' )
# ax.set_ylabel( '∆ (baseline SD)' )
plt.legend( fontsize = 12, loc = 'upper left' )
ax.text( 5, 8.5, '+ Drug', alpha = 0.5 )Response heterogeneity across the imaging field
The traces above depict only three of the 64 PCs we used as our bath application–specific subspace—and, only two of the 196 (14-by-14) patches yielded from inference on this experimental session.
When we look at the feature dynamics across all patches within the imaging field, the full richness of this method in characterizing becomes apparent:
Code
from astrocytes._datasets._embeddings import PatchEmbeddingTrace
#
def _get_movie_traces( movie_uuid: str, verbose: bool = False ) -> list[list[PatchEmbeddingTrace]]:
# TODO Created `wids` indexed WebDataset for easier indexing in demos
wds_url = (
'https://data.forecastbio.cloud/testing/patch-pc-traces/bath-application/'
+ 'bath_app-dinov3_vit7b16-pca64-smooth84.tar'
)
ds = atdata.Dataset[PatchEmbeddingTrace]( wds_url )
started_movie = False
movie_traces: list[PatchEmbeddingTrace] = []
if verbose:
print( 'Iterating traces...' )
for trace in ds.ordered( batch_size = None ):
try:
assert trace.metadata is not None
assert 'uuid' in trace.metadata
if trace.metadata['uuid'] == movie_uuid:
if verbose and not started_movie:
print( 'Snagging correct movie...' )
started_movie = True
movie_traces.append( trace )
else:
if started_movie:
# We've moved to a new movie uuid, and so finished up
if verbose:
print( 'Got it!' )
break
except:
print( 'Skipping trace - no metadata' )
continue
# Arrange nicely
ret = [
[
None
for j in range( N_PATCHES_X )
]
for i in range( N_PATCHES_Y )
]
for trace in movie_traces:
ret[trace.i_patch][trace.j_patch] = trace
return ret
##
# Collate full dataset
first_movie_traces = _get_movie_traces( first_movie_id, verbose = False )
# Do plot
i_pc = 0
with HouseStyle() as s:
fig, axs = plt.subplots( N_PATCHES_Y, N_PATCHES_X,
figsize = (16, 12),
# sharex = True,
# sharey = True,
)
for i in range( N_PATCHES_Y ):
for j in range( N_PATCHES_X ):
ax = axs[i, j]
trace = first_movie_traces[i][j]
ts_rel = trace.ts - t_intervene + causal_offset
cur_values_norm, cur_mean, cur_std = \
_baseline_normalize( t_rel, trace.values[i_pc, :], baseline_window )
filter_plot = (
(ts_rel >= plot_window[0])
& (ts_rel < plot_window[1])
)
ax.plot( ts_rel[filter_plot], cur_values_norm[filter_plot], f'C0-', linewidth = 1.5 )
xl = ax.get_xlim()
ax.plot( xl, [0, 0], 'k-', linewidth = 1, zorder = -200 )
# yl = ax.get_ylim()
yl = (-10.5, 10.5)
ax.fill_between( [0, xl[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0, zorder = 100 )
if (i, j) in test_patches:
ax.fill_between( [xl[0], xl[1]], yl[0], yl[1], facecolor = 'none', edgecolor = f'k', alpha = 0.7, linewidth = 2., zorder = -300 )
ax.set_xlim( xl )
ax.set_ylim( yl )
ax.set_yticks( [] )
ax.set_xticks( [] )
for i in range( 1, N_PATCHES_X ):
axs[-1, i].set_xticks( [-90, 0, 240], ['', '', ''] )
axs[-1, 0].set_xticks( [-90, 0, 240], ['–90', '0', '240'], fontsize = 8, rotation = 90 )
for i in range( 0, N_PATCHES_Y - 1 ):
# axs[i, 0].set_yticks( [-10, 0, 10], ['', '', ''] )
pass
axs[-1, 0].set_yticks( [-10, 0, 10], ['-10', '0', '10'], fontsize = 8 )
fig.subplots_adjust( wspace = 0.1, hspace = 0.1 )Code
# Do plot
i_pc = 1
with HouseStyle() as s:
fig, axs = plt.subplots( N_PATCHES_Y, N_PATCHES_X,
figsize = (16, 12),
# sharex = True,
# sharey = True,
)
for i in range( N_PATCHES_Y ):
for j in range( N_PATCHES_X ):
ax = axs[i, j]
trace = first_movie_traces[i][j]
ts_rel = trace.ts - t_intervene + causal_offset
cur_values_norm, cur_mean, cur_std = \
_baseline_normalize( t_rel, trace.values[i_pc, :], baseline_window )
filter_plot = (
(ts_rel >= plot_window[0])
& (ts_rel < plot_window[1])
)
ax.plot( ts_rel[filter_plot], cur_values_norm[filter_plot], f'C2-', linewidth = 1.5, zorder = 200 )
xl = ax.get_xlim()
ax.plot( xl, [0, 0], 'k-', linewidth = 1, zorder = -200 )
# yl = ax.get_ylim()
yl = (-15.5, 15.5)
ax.fill_between( [0, xl[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0, zorder = 100 )
if (i, j) in test_patches:
ax.fill_between( [xl[0], xl[1]], yl[0], yl[1], facecolor = 'none', edgecolor = f'k', alpha = 0.7, linewidth = 2., zorder = -300 )
ax.set_xlim( xl )
ax.set_ylim( yl )
ax.set_yticks( [] )
ax.set_xticks( [] )
for i in range( 1, N_PATCHES_X ):
axs[-1, i].set_xticks( [-90, 0, 240], ['', '', ''] )
axs[-1, 0].set_xticks( [-90, 0, 240], ['–90', '0', '240'], fontsize = 8, rotation = 90 )
for i in range( 0, N_PATCHES_Y - 1 ):
# axs[i, 0].set_yticks( [-10, 0, 10], ['', '', ''] )
pass
axs[-1, 0].set_yticks( [-15, 0, 15], ['-15', '0', '15'], fontsize = 8 )
fig.subplots_adjust( wspace = 0.1, hspace = 0.1 )Components of static anatomy and dynamic physiology in patch embeddings
Because the original PCA used to detemrine the bath application–specific subspace of DINOv3 patch embeddings was performed across the full dataset, the image feature variance optimized in the method incorporated contributions from two distinct facets of the image structure: first, the contribution from variations across the static astrocyte network anatomical architectures imaged within individual patch; and second, the physiological dynamics that unfold within an individual patch.
In order to appreciate the structure of these two distinct contributions within the PC subspace projections in the bath application dataset, we can take a look at a handful of representative patches to visualize the way astrocyte network patches “dance” around PC-space:
Code
## Figure parameters
random_seed = 89
n_traces_show_raw = 100
##
rng = np.random.default_rng( random_seed )
traces_show_raw = rng.permutation( n_traces_show )[:n_traces_show_raw]
##
with HouseStyle() as s:
fig = plt.figure( figsize = (10, 10) )
ax = fig.add_subplot( projection = '3d' )
ax.view_init( azim = -62 )
#
it = traces_show_raw
if panel_verbose:
it = tqdm( it )
for i_trace in it:
# trace start
ax.scatter(
trace_data_raw_show[i_trace, 0, 0],
trace_data_raw_show[i_trace, 1, 0],
trace_data_raw_show[i_trace, 2, 0],
color = 'k',
s = 3
)
# trace end
ax.scatter(
trace_data_raw_show[i_trace, 0, -1],
trace_data_raw_show[i_trace, 1, -1],
trace_data_raw_show[i_trace, 2, -1],
facecolor = f'C{i_trace % 4}',
edgecolor = 'k',
linewidth = 0.5,
s = 8
)
# trajectory
ax.plot(
trace_data_raw_show[i_trace, 0, :],
trace_data_raw_show[i_trace, 1, :],
trace_data_raw_show[i_trace, 2, :],
f'C{i_trace % 4}-',
linewidth = 0.5,
alpha = 0.8,
)
ax.set_xticks( [0], '' )
ax.set_yticks( [0], '' )
ax.set_zticks( [0], '' )
ax.set_xlabel( f'PC {pcs_show[0]+1}' )
ax.set_ylabel( f'PC {pcs_show[1]+1}' )
ax.set_zlabel( f'PC {pcs_show[2]+1}', rotation = 90 )
ax.set_xlabel( f'PC {pcs_show[0]+1}', rotation = -14 )
ax.set_ylabel( f'PC {pcs_show[1]+1}', rotation = 37 )
ax.set_zlabel( f'PC {pcs_show[2]+1}', rotation = 90 )
ax.set_box_aspect((1, 1, 1.1), zoom=0.85)As expected, the largest contributor to the heterogeneity of individual frames’ latent embeddings appears to come from the differences in anatomical structure—that is, from the regularities of the frames’ image content arising from the difference in the shape of the cells within the specific patch—as opposed to physiologic changes within a static anatomical architecture. We see this in the distinctive separation of the trajectories of individual patches within embedding-space: while the positions of individual patches do change over time across the recording, these dynamic changes tend to be small relative to the separation between full patch trajectories.
However, interestingly, those (relatively small) dynamic deflections do also appear to posess their own distinct structure; this is accentuated by plotting the patch trajectories normalized to each individual patch’s baseline period:
Code
with HouseStyle() as s:
fig = plt.figure( figsize = (10, 10) )
ax = fig.add_subplot( projection = '3d' )
ax.view_init( elev = 20 )
# for i_trace in tqdm( traces_show_raw ):
for i_trace in traces_show_raw:
# start
ax.scatter(
trace_data_show[i_trace, 0, 0],
trace_data_show[i_trace, 1, 0],
trace_data_show[i_trace, 2, 0],
color = 'k',
s = 3
)
# end
ax.scatter(
trace_data_show[i_trace, 0, -1],
trace_data_show[i_trace, 1, -1],
trace_data_show[i_trace, 2, -1],
facecolor = f'C{i_trace % 4}',
edgecolor = 'k',
linewidth = 0.5,
s = 8
)
# trajectory
ax.plot(
trace_data_show[i_trace, 0, :],
trace_data_show[i_trace, 1, :],
trace_data_show[i_trace, 2, :],
f'C{i_trace % 4}-',
linewidth = 0.5,
alpha = 0.8,
)
ax.set_xlim( -32, 32 )
ax.set_ylim( -32, 32 )
ax.set_zlim( -12, 52 )
ax.tick_params(axis='y', which='major', pad=-5)
ax.set_xticks( [-30, 0, 30], ['', '', ''], fontsize = 12 )
ax.set_yticks( [-30, 0, 30], ['–30', '', '+30'], rotation = -55, ha = 'left', fontsize = 12 )
ax.set_zticks( [-10, 0, 50], ['–10', '', '+50'], fontsize = 12 )
ax.set_xlabel( f'PC {pcs_show[0]+1}', rotation = -14 )
ax.set_ylabel( f'PC {pcs_show[1]+1}', rotation = 37 )
ax.set_zlabel( f'PC {pcs_show[2]+1}', rotation = 90 )
ax.set_box_aspect( (1, 1, 1.1), zoom=0.85 )
# fig.tight_layout()Put together, these results indicate that the latent structure of patch embeddings from DINOv3—a large, self-supervised image model without any specific biological emphasis in training—contains subspaces rich in information about both the anatomical structure of cells within the astrocyte network (as captured through two-photon imaging of our particular intracellular calcium indicator). This is a remarkable testament to both the intricate order of the compositional structure within image data writ large, as well as to the remarkable power that the self-supervised learning objective has in yielding models able to internalize many highly-nontrivial features of this intricate structure within the statistics of naturalistic images, even within highly-specific domains.
Image embedding features linearly separate physiological impacts of distinct compounds
The richness, in particular, of these embeddings’ portrait of physiologic changes within patches after the application of pharmacology—even if not the largest contributor to the layout of embeddings within a vanilla large image model like DINOv3—raises the question of whether a small subspace of imaging features contains information about the identity of the applied compound. This would extend our prior work demonstrating that specific more classical features of astrocyte calcium dynamics can encode this information at the more nonlocal level of full imaging fields (1) to a potential encoding within individual small network patches.
To investigate the extent of these local differences, we can look at one of the same representatives patch we did in Figure 4, but across multiple chemical exposures:
Code
test_patch_compare = test_patches[0]
movies_compare = [
first_movie_id,
comparison_movie_id,
]
frame_blocks_compare = [
movie_frames,
comparison_movie_frames,
]
test_patch_frames_compare = []
cur_i, cur_j = test_patch_compare
for cur_frames in frame_blocks_compare:
patch_idx_y = ( int( np.floor( patch_size_y * cur_i ) ),
int( np.ceil( patch_size_y * (cur_i + 1) ) ) )
patch_idx_x = ( int( np.floor( patch_size_y * cur_j ) ),
int( np.ceil( patch_size_y * (cur_j + 1) ) ) )
test_patch_frames_compare.append(
np.array( [
frame.image[0][patch_idx_y[0]:patch_idx_y[1], :][:, patch_idx_x[0]:patch_idx_x[1]]
for frame in cur_frames
] )
)
test_patch_frames_compare_filtered = [
gaussian_filter( fs, sigma = (3., 0.6, 0.6) )
for fs in test_patch_frames_compare
]
#
dt = movie_frames[3].t
t_intervene = 300
cmax = 70
i_frame_compare = (250, 350, 450, 550, 650, 750)
n_series = len( test_patch_frames_compare )
n_frame_compare = len( i_frame_compare )
#
with HouseStyle() as s:
fig, axs = plt.subplots( n_patches, n_frame_compare,
figsize = (10, 4),
sharey = True,
)
for i_series in range( n_series ):
cur_patch_frames = test_patch_frames_compare_filtered[i_series]
cur_compound = frame_blocks_compare[i_series][0].applied_compound
for i_t in range( n_frame_compare ):
ax = axs[i_series, i_t]
ax.imshow(
cur_patch_frames[i_frame_compare[i_t], :, :],
#
clim = (-cmax, cmax),
cmap = COMPOUND_CMAPS[cur_compound],
)
if i_series == 0:
t_val = i_frame_compare[i_t] * dt - t_intervene
ax.set_title( f'{t_val:+0.0f}s', fontsize = 12 )
if i_series == (n_patches - 1) and i_t == 0:
ax.tick_params( axis = 'x', length = 3 )
ax.set_xticks( [-0.5, patch_size_x+0.5], ['0', f'{movie_frames[0].scale_x * patch_size_x:0.0f} µm'], ha = 'left' )
else:
# axs[i_patch, i_t].set_xticks( [-0.5, patch_size_x+0.5], ['', ''] )
ax.tick_params( axis = 'x', length = 0 )
ax.set_xticks( [-0.5, patch_size_x+0.5], ['', ''] )
ax.set_yticks( [] )
if i_t == 0:
ax.set_ylabel( f'{cur_compound}', color = COMPOUND_COLORS[cur_compound] )
# s.label( xaxis_y = 0 )
plt.subplots_adjust( wspace = 0.04, hspace = 0. )We see by eye that t-ACPD evokes a considerably larger response in this patch, and this is reflected in the patch embeddings: while in Figure 5 above we were able to clearly distinguish the subtle modulations induced by the introduction of baclofen, when placed on the same scale (top row) as the changes induced in this same patch by t-ACPD (bottom row), in Figure 11 below, the difference is evident:
Code
from tqdm import tqdm
import atdata
import astrocytes
from astrocytes._datasets._embeddings import (
PatchEmbeddingTrace,
)
#
t_intervene = 300
baseline_window = (-100, -15)
plot_window = (-100, 240)
#
wds_url = (
'https://data.forecastbio.cloud/testing/patch-pc-traces/bath-application/'
+ 'bath_app-dinov3_vit7b16-pca64-smooth84.tar'
)
ds = atdata.Dataset[PatchEmbeddingTrace]( wds_url )
test_traces_compare = [ None for _ in movies_compare ]
# for trace in tqdm( ds.ordered( batch_size = None ) ):
for trace in ds.ordered( batch_size = None ):
try:
assert trace.metadata is not None and 'uuid' in trace.metadata
assert trace.metadata['uuid'] in movies_compare
i_movie = movies_compare.index( trace.metadata['uuid'] )
if trace.i_patch == test_patch_compare[0] and trace.j_patch == test_patch_compare[1]:
test_traces_compare[i_movie] = trace
# print( 'Caught', trace.metadata['uuid'] )
should_finish = True
for x in test_traces_compare:
if x is None:
should_finish = False
if should_finish:
break
except:
# print( trace.metadata['uuid'], trace.i_patch, trace.j_patch )
continue
#
causal_offset = (84 * dt) / 4.
t_rel = test_traces[0].ts - t_intervene + (causal_offset)
with HouseStyle( grids = True ) as s:
fig, axs = plt.subplots( n_patches, 1,
figsize = (7, 12),
sharex = True,
# sharey = True,
)
for i_series in range( n_patches ):
ax = axs[i_series]
cur_trace = test_traces_compare[i_series]
cur_compound = frame_blocks_compare[i_series][0].applied_compound
values_0_norm, mean_0, std_0 = \
_baseline_normalize( t_rel, cur_trace.values[0, :], baseline_window )
values_1_norm, mean_1, std_1 = \
_baseline_normalize( t_rel, cur_trace.values[1, :], baseline_window )
values_2_norm, mean_2, std_2 = \
_baseline_normalize( t_rel, cur_trace.values[4, :], baseline_window )
filter_plot = (
(t_rel >= plot_window[0])
& (t_rel < plot_window[1])
)
ax.plot( t_rel[filter_plot], values_0_norm[filter_plot], 'C0-', label = 'PC 1' )
ax.plot( t_rel[filter_plot], values_1_norm[filter_plot], 'C2-', label = 'PC 2' )
ax.plot( t_rel[filter_plot], values_2_norm[filter_plot], 'C3-', label = 'PC 5' )
yl = (-6, 85)
ax.fill_between( [0, plot_window[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0 )
ax.plot( plot_window, [0, 0], 'k-', linewidth = 1 )
ax.plot( [plot_window[0], plot_window[0]], yl, 'k-', linewidth = 1 )
ax.set_xticks( [-90, 0, 60, 120, 180, 240] )
# ax.set_yticks( [0, 5, 10] )
ax.set_xlim( plot_window )
ax.set_ylim( yl )
ax.set_ylabel( f'{cur_compound}', color = COMPOUND_COLORS[cur_compound] )
if i_series == (n_patches - 1):
ax.set_xlabel( 'Time (s)' )
# ax.set_ylabel( '∆ (baseline SD)' )
plt.legend( fontsize = 12, loc = 'upper left' )
ax.text( 5, 8.5, '+ Drug', alpha = 0.5 )The fact that these feature axes within patch embedding–space separate responses to the two compounds is not unique to this patch or this cortical slice: in fact, we can average across the responses in all patches in all recordings throughout the entire dataset, in order to view the unique profiles of dynamic responses to baclofen (red) and t-ACPD (blue):
Code
pcs_show_testing = (0, 1, 2, 3, 4, 5, 6, 7)
n_pcs_show = len( pcs_show_testing )
with HouseStyle( grids = True ) as s:
fig, axs = plt.subplots( n_pcs_show // 2, 2,
figsize = (11, 12),
sharex = True,
)
for i_seq, i_pc in enumerate( pcs_show_testing ):
i_ax = i_seq // 2
j_ax = i_seq % 2
ax = axs[i_ax, j_ax]
ts_rel = trace.ts - t_intervene + causal_offset
filter_baseline = (
(ts_rel >= baseline_window[0])
& (ts_rel < baseline_window[1])
)
raw_values = {
c: np.array( [
_baseline_normalize(
ts_rel,
trace.values[i_pc, :],
baseline_window,
)[0]
for trace in traces
] )
for c, traces in compound_traces.items()
}
middle_trace = {
c: np.mean( vs, axis = 0 )
for c, vs in raw_values.items()
}
# print( { c: vs.shape for c, vs in raw_values.items() } )
middle_boots = {
c: boot_stat( vs, lambda x: np.mean( x, axis = 0 ), axis = 0,
n = 200,
verbose = False,
)
for c, vs in raw_values.items()
}
# print( { c: vs.shape for c, vs in middle_boots.items() } )
middle_low = {
c: np.quantile( vs, 0.025, axis = 0 )
for c, vs in middle_boots.items()
}
middle_high = {
c: np.quantile( vs, 0.975, axis = 0 )
for c, vs in middle_boots.items()
}
filter_plot = (
(ts_rel >= plot_window[0])
& (ts_rel < plot_window[1])
)
#
for c in middle_trace.keys():
ax.plot( ts_rel[filter_plot], middle_trace[c][filter_plot], '-',
color = COMPOUND_COLORS[c]
)
ax.fill_between(
ts_rel[filter_plot],
middle_low[c][filter_plot],
middle_high[c][filter_plot],
color = COMPOUND_COLORS[c],
alpha = 0.2,
linewidth = 0.5,
)
xl = plot_window
yl = ax.get_ylim()
# yl = (-6, 6)
ax.plot( xl, [0, 0], 'k-', linewidth = 1 )
ax.fill_between( [0, xl[1]], yl[0], yl[1],
color = 'k',
alpha = 0.05,
linewidth = 0,
)
ax.set_xlim( xl )
ax.set_ylim( yl )
ax.set_ylabel( f'PC {i_pc + 1}' )
ax.set_xticks( [-90, 0, 60, 120, 180, 240] )
fig.subplots_adjust( wspace = 0.2, hspace = 0.1 )Amazingly, we see that these feature axes capture consistent differences in the evoked imaging features between the two compounds, across a large array of different individual mice, slices, and experiments.
The question then arises: are these consistent differences robust enough to decode the compound applied to a particular patch of astrocyte network? We can test this by using a simple generalized linear readout (via logistic regression) of our PC-space to classify the applied compound. To understand in more detail the dynamics of this potential information content about the applied compound, we’ll restrict ourselves to looking at decoding from the activity in a single patch at a single time-point after compound application, and then vary across all time-points within the experiment.
Looking at this decoding performance across time reveals some interesting patterns:
Code
with HouseStyle( grids = True ) as s:
fig, ax = plt.subplots( figsize = (8, 5) )
#
for i_seq, (n_pc_cur, acc_cv_cur) in enumerate( zip( n_pc_try, acc_cv ) ):
middle_trace = np.mean( acc_cv_cur.T, axis = 0 )
middle_boots = boot_stat( acc_cv_cur.T, lambda x: np.mean( x, axis = 0 ), axis = 0,
n = 1_000,
verbose = False
)
middle_low = np.quantile( middle_boots, 0.005, axis = 0 )
middle_high = np.quantile( middle_boots, 0.995, axis = 0 )
ax.plot( ts_rel[idx_iter], middle_trace, label = f'{n_pc_cur} PCs', zorder = -i_seq * 10 )
ax.fill_between( ts_rel[idx_iter], middle_low, middle_high,
alpha = 0.2,
zorder = -1 - i_seq * 10,
)
#
middle_trace = np.mean( acc_cv_shuffle.T, axis = 0 )
middle_boots = boot_stat( acc_cv_shuffle.T, lambda x: np.mean( x, axis = 0 ), axis = 0,
n = 1_000,
verbose = False
)
middle_low = np.quantile( middle_boots, 0.005, axis = 0 )
middle_high = np.quantile( middle_boots, 0.995, axis = 0 )
ax.plot( ts_rel[idx_iter], middle_trace, color = 'k', label = 'Permuted', zorder = -100 )
ax.fill_between( ts_rel[idx_iter], middle_low, middle_high,
color = 'k',
alpha = 0.2,
zorder = -101
)
xl = window_decode
# yl = ax.get_ylim()
yl = (0.36, 1.04)
ax.plot( xl, [0, 0], 'k-', linewidth = 1 )
ax.fill_between( [0, xl[1]], yl[0], yl[1],
color = 'k',
alpha = 0.05,
linewidth = 0,
zorder = -200
)
ax.text( 5, 0.75, '+ Drug', alpha = 0.5, va = 'center' )
ax.set_xlabel( 'Time (s)' )
ax.set_ylabel( f'Decoding accuracy' )
plt.legend( fontsize = 12 )
s.label( xaxis_y = 0.36, data_xlim=(-60, 240))
ax.set_xticks( [-60, 0, 60, 120, 180, 240] )
ax.set_xlim( xl )
ax.set_ylim( yl )Not only are we able to decode quite well the identity of the applied compound from a single ~60µm patch of the astrocyte network, but it is clear that the particular richness of these imaging features is necessary in order to tease the compound differences apart. This indicates that it is not a simple matter of intensity at play here to distinguish systematically between the two compounds: rather, a sparse but rich subspace of the overall DINOv3 image feature patch embedding space contains the information content needed to identify . This may indicate that these image features contain a nontrivial reflection, via the observed calcium activity, of underlying differences in physiological processes recruited by these two distinct pathways (i.e., GABAB vs mGluR3 receptors).
Discussion
This brief foray into analyzing the OpenAstrocytes data demonstrates the remarkable richness present in astrocyte calcium dynamics, as well as the amazing potential that lies in making this data readily available in a format that slots directly into today’s AI-driven biological data science workflows.
We’re particularly interested in highlighting three key patterns we’ve seen here:
- Generalist AI features contain specialized information: While DINOv3 was trained on a very broad image corpus, we were able to see both anatomical features of local astrocyte networks, as well as physiological features of the dynamics of intracellular astrocyte calcium, reflected in small subspaces of these embeddings. This has large implications for the application of these generalist large models to biological data science.
- Heterogeneity of responses across individual fields: It is amazing to see consistency in the responses of particular subspaces to particular drugs; however, we’re very excited to dig into what underlies the differences between responses in specific small pieces of the same slice of brain tissue. We think there’s a lot more about astrocyte biology hidden in the splitting of these differences.
- Decoding performance from rich features: We saw great chemical decoding performance from a single frame of one ~60µm patch of tissue—and this is a pessimistic estimate, given how many patches were unresponsive due to differences in expression of the calcium indicator! There’s a lot to tease apart in the richness of the heterogeneity we saw, particularly in the time-evolution. While we didn’t need to include all of the DINOv3 embedding features to get good decoding, we needed a few; we’re excited about this pattern of results, as it suggests that the decoding performance isn’t arising from some trivial feature (like increased intensity), but instead from a richer combination of features that could generalize to more interesting predictive tasks.
Looking ahead
The OpenAstrocytes corpus is one step in our larger thrust into providing open tools to understand the dynamics of astrocyte physiology, as well as our approach within Forecast to use this incredible science to understand how our chemical tools change brain networks.
Building on atdata
atdata’s typing functionality already helps for keeping datasets sane; but, where keeping track of sample types really shines is when we register mappings between types. As an example, take this from the astrocytes library accompanying this pub:
@atdata.lens
def from_generic( s: ts.Frame ) -> 'BathApplicationFrame':
assert s.metadata is not None
return BathApplicationFrame(
# TODO
applied_compound = 'unknown',
image = s.image,
t_index = s.metadata['frame']['t_index'],
t = s.metadata['frame']['t'],
#
date_acquired = s.metadata['date_acquired'],
movie_uuid = s.metadata['uuid'],
)Under the hood, atdata.lens keeps track of the full registry lenses between sample schema types. Because lenses can have a very nice compositional behavior (9), this design enables us, as we’re building AI models from a bunch of different sources, to easily and automatically aggregate across our diverse hive of WebDatasets, built across different modalities and experiments, taking advantage of the natural ways these data can be compatibly merged.
The full atdata ecosystem is architected to go one step further, placing metadata for type schemas of scientific datasets—and the code for these lenses between them!—in a public, decentralized repository on ATProto, the protocol layer underlying the Bluesky social network. This allows AI coding agents on data science tasks to automatically leverage domain knowledge of structured semantic relationships to decide what data to pull from for a given task.
atdata soon, including our ATProto appview for the full distributed dataset network with MCP support—sign up for updates to hear first as we push more features to the public release!Loving neurons, too
Of course, the brain also is one-third neurons—a fact we certainly wouldn’t want to forget. This is why we’re building out the OpenAstrocytes repository with newly-collated data from the Allen Brain Observatory (10), also formatted with atdata for easy use with existing AI workflows.
from astrocytes.data import allen
from astrocytes.data.allen.schema import BrainObservatoryFrame
dataset = (
allen
.generic.braion_observatory
.dataset
.as_type( BrainObservatoryFrame )
)
# Pull an example sample
sample = next( x for x in dataset.shuffled( batch_size = None ) )
with HouseStyle() as s:
fig, ax = plt.figure( figsize = (8, 8) ))
plt.imshow( sample.image )We’re continuing to collate this fantastic dataset’s movies of raw two-photon calcium imaging from neurons in visual cortex, as it provides another phenomenal source of neurobiological imaging data for augmenting fine-tuning workflows; since neuronal and astrocytic imaging share tremendous overlap in latent image statistics, we anticipate that combining large datasets of similar dynamic imaging from the two cell types will provides immense benefit to the study of each.
A new kind of data
At Forecast, we’re building out a new experimental dataset taken not from , but instead from 3D in vitro systems built from human induced pluripotent stem cells. Human astrocyte dynamics are a near-completely unexplored frontier within neuroscience; we are extremely excited about the potential lying within this system, as we build out OpenAstrocytes in the next few months with more and more data taken from this completely blank canvas. Stay tuned ;).
Acknowledgements
We’d like to thank Dr. Michelle Cahill and Dr. Michael Reitman for driving the original experimental work and data collection for the images that went into the initial release of OpenAstrocytes presented here.
Support for the production of OpenAstrocytes at Forecast was generously provided by the Special Initiatives division of the Astera Institute.
Citation
Please reference this work in BibTeX as:
@article{levesque2025openastrocytes,
author = {Maxine Levesque and Kira Poskanzer},
title = {OpenAstrocytes},
journal = {Forecast Research},
year = {2025},
note = {https://forecast.bio/research/open-astrocytes/},
}AI use
No language model tools were used to produce any of the code for or writing in this pub.
Copyright
Copyright © 2025 Forecast Bio, Inc. The contents of this article are licensed under the Creative Commons CC-BY 4.0 license.